# Open-source Model

Gemma 3
Gemma 3
Gemma 3 is Google's latest open-source model, developed using research and technology from Gemini 2.0. It's a lightweight, high-performance model that runs on a single GPU or TPU, providing developers with powerful AI capabilities. Gemma 3 offers various sizes (1B, 4B, 12B, and 27B), supports over 140 languages, and boasts advanced text and visual reasoning capabilities. Its key advantages include high performance, low computational requirements, and extensive multilingual support, making it suitable for rapid AI application deployment on diverse devices. The launch of Gemma 3 aims to promote AI technology adoption and innovation, helping developers achieve efficient development across different hardware platforms.
AI Model
95.2K
Hibiki
Hibiki
Hibiki is an advanced model focusing on streaming voice translation. It generates accurate translations in real time by accumulating sufficient contextual information, supporting both voice and text translation, and facilitating voice conversion. The model is based on a multi-stream architecture, capable of simultaneously processing source and target speech, producing continuous audio streams and timestamped text translations. Its main advantages include high-fidelity voice conversion, low-latency real-time translation, and compatibility with complex reasoning strategies. Hibiki currently supports translation from French to English and is suitable for efficient real-time translation scenarios, such as international conferences and multilingual live events. The model is open-source and free, making it ideal for developers and researchers.
Translation
59.3K
Fresh Picks
YuE
Yue
YuE is an open-source music generation model developed by the Hong Kong University of Science and Technology and a multimodal art projection team. It can generate full songs up to 5 minutes long, including vocals and accompaniment, based on given lyrics. The model addresses the complex issues of lyric-to-song generation through various technological innovations, such as semantic-enhanced audio taggers, dual-tagging technology, and lyrical chain thinking. The main advantages of YuE include its ability to produce high-quality musical works and support for multiple languages and music styles, offering strong scalability and controllability. The model is currently free and open-source, aimed at promoting the advancement of music generation technology.
Music Generation
94.1K
MatterGen
Mattergen
Launched by Microsoft Research, MatterGen is a generative AI tool for material design. It can directly generate new materials with specific chemical, mechanical, electronic, or magnetic properties based on application design requirements, providing a new paradigm for material exploration. This tool is expected to accelerate the R&D process for novel materials, lower R&D costs, and play a significant role in fields such as batteries, solar cells, and CO2 adsorbents. Currently, MatterGen's source code is open-sourced on GitHub for public use and further development.
Research Equipment
70.1K
Kokoro-82M
Kokoro 82M
Kokoro-82M is a text-to-speech (TTS) model created by hexgrad and hosted on Hugging Face. It features 82 million parameters and is open-sourced under the Apache 2.0 license. The model released version 0.19 on December 25, 2024, offering 10 unique voice packages. Kokoro-82M ranks first in the TTS Spaces Arena, showcasing its efficiency in parameter scale and data usage. It supports both American and British English, making it suitable for generating high-quality speech output.
Text to Speech
118.7K
Allegro-TI2V
Allegro TI2V
Allegro-TI2V is a text-to-image-to-video generation model that creates video content based on user-provided prompts and images. The model is recognized for its open-source nature, diverse content creation capabilities, high-quality outputs, compact efficient model parameters, and support for various precision and GPU memory optimizations. It represents cutting-edge advancements in AI technology for video generation, holding significant technical value and commercial application potential. The Allegro-TI2V model is available on the Hugging Face platform under the Apache 2.0 open-source license, allowing users to download and use it for free.
Video Production
65.4K
Qwen2.5-Coder-32B-Instruct-AWQ
Qwen2.5 Coder 32B Instruct AWQ
Qwen2.5-Coder represents a series of large language models optimized for code generation, covering six mainstream model sizes with 0.5, 1.5, 3, 7, 14, and 32 billion parameters, catering to the diverse needs of developers. Qwen2.5-Coder shows significant improvements in code generation, inference, and debugging, trained on a robust Qwen2.5 backbone with a token expansion to 5.5 trillion, including source code, text grounding, and synthetic data, making it one of the most advanced open-source code LLMs, with coding capabilities comparable to GPT-4o. Additionally, Qwen2.5-Coder offers a more comprehensive foundation for applications in real-world scenarios such as code agents.
Code Inference
49.4K
Qwen2.5-Coder-1.5B
Qwen2.5 Coder 1.5B
Qwen2.5-Coder-1.5B is a large language model in the Qwen2.5-Coder series, focusing on code generation, reasoning, and debugging. Built upon the robust Qwen2.5 architecture, this model has significantly expanded the training tokens to 5.5 trillion, incorporating source code, textual code bases, synthetic data, and more, making it a leader among open-source code LLMs, rivaling GPT-4o's coding capabilities. Moreover, Qwen2.5-Coder-1.5B has enhanced its mathematical and general capabilities, providing a more comprehensive foundation for practical applications such as code agents.
Coding Assistant
50.0K
Chinese Picks
Tencent Hunyuan 3D
Tencent Hunyuan 3D
Tencent Hunyuan 3D is an open-source 3D generation model designed to address the shortcomings in generation speed and generalization capabilities of existing 3D generation models. Utilizing a two-stage generation approach, the first stage rapidly generates multi-view images using a multi-view diffusion model, while the second stage quickly reconstructs 3D assets through a feed-forward reconstruction model. The Hunyuan 3D-1.0 model aids 3D creators and artists in automating the production of 3D assets, enabling quick single-image 3D generation, and completing end-to-end production—including mesh and texture extraction—within 10 seconds.
3D modeling
98.0K
hertz-dev
Hertz Dev
Hertz-dev is a full-duplex, audio-only transformer foundational model open-sourced by Standard Intelligence, featuring 8.5 billion parameters. This model represents scalable cross-modal learning technology capable of converting mono 16kHz speech into an 8Hz latent representation at a bitrate of 1kbps, outperforming other audio encoders. Key advantages of hertz-dev include low latency, high efficiency, and accessibility for researchers to fine-tune and build upon. Contextual information indicates that Standard Intelligence is committed to developing general intelligence that benefits humanity, with hertz-dev being a substantial step in that direction.
Model Training and Deployment
52.4K
English Picks
Mochi 1
Mochi 1
Mochi 1 is an open-source video generation model introduced by Genmo as a research preview version, aiming to address fundamental issues in the current AI video landscape. The model is renowned for its unparalleled motion quality, exceptional prompt-following capabilities, and its ability to bridge the uncanny valley, generating coherent and fluid human actions and expressions. Mochi 1 was developed in response to the growing demand for high-quality video content, particularly in the gaming, film, and entertainment industries. A free trial is currently available, though detailed pricing information is not provided on the page.
Video Production
64.0K
Janus
Janus
Janus is an innovative autoregressive framework that addresses the limitations of previous methods by decoupling visual encoding into distinct pathways while utilizing a single, unified transformer architecture for processing. This decoupling not only alleviates the role conflict of the visual encoder in understanding and generation but also enhances the framework's flexibility. Janus outperforms earlier unified models and matches or exceeds the performance of task-specific models. Its simplicity, high flexibility, and effectiveness make it a strong candidate for next-generation unified multimodal models.
Model Training and Deployment
51.9K
Fresh Picks
CogVideoX
Cogvideox
CogVideoX is an open-source video generation model that shares lineage with commercial models, enabling the generation of video content through textual descriptions. It represents the latest advancements in text-to-video generation technology, capable of producing high-quality videos applicable in various fields including entertainment, education, and commercial promotion.
AI Video Generation
75.3K
Featured AI Tools
Flow AI
Flow AI
Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.
Video Production
43.1K
NoCode
Nocode
NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.
Development Platform
46.1K
ListenHub
Listenhub
ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.
AI
43.6K
MiniMax Agent
Minimax Agent
MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.
Multimodal technology
45.3K
Chinese Picks
Tencent Hunyuan Image 2.0
Tencent Hunyuan Image 2.0
Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.
Image Generation
44.2K
OpenMemory MCP
Openmemory MCP
OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.
open source
43.9K
FastVLM
Fastvlm
FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.
Image Processing
42.0K
Chinese Picks
LiblibAI
Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase